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A  PREDICTION  INTERVAL  FOR  A  FIRST 


ORDER  GAUSSIAN  MARKOV  PROCESS 
by 

Toke  Jayachandran  and  T.S.  Murthy 


Let  xc(t  *  1,2,...)  be  a  stationary  Gaussian  Markov  process  of  order  one 

2  k 

with  E(xt)  *  u  and  Cov(xt,xt+k)  ■  o  p  .  We  derive  a  prediction  interval  for 

x_  based  on  the  preceding  2n  observations  x.x. , ...,x0  . 

Zn+1  l  i  Zn 


1.  INTRODUCTION 

Consider  a  stationary  Gaussian  Markov  process  of  order  one  with  E(xt)  ■  y 
2  k 

and  Cov(xt,xt+fc)  «  a  p  .  For  y  ■  0  such  a  process  can  be  generated  from  an 
autoregressive  model 


x„  *  px„  ,  +  €  t  *  1,2,...  (1.1) 

t  t-1  t 

with  {et>  a  sequence  of  independent  and  identically  distributed  random  variables 

with  normal  distributions  N(0,o  ),  |p|<l  and  xq  »  0.  The  process  has  many 

applications  such  as  in  modelling  certain  economic  and  meteorological  time 

series.  From  a  set  of  sample  observations  xi»x2» * • • »x2k*  in  this  paper,  we 

construct  a  conditional  prediction  interval  for  x2^+1  treating  one  half  of 

the  observations  as  conditioning  variables.  The  effect  of  the  parameters 
2 

o  ,p,k  and  the  prediction  coefficient  a  on  the  prediction  Interval  is  also 
investigated. 


2.  DERIVATION  OF  PREDICTION  INTERVAL 
For  the  stochastic  process  defined  above  it  can  be  shown  [1]  that  when 
x2k  i*  k  "  l,2,...,n+l  are  fixed,  Xj^,  k  *  l,2,...,n  are  conditionally  inde¬ 
pendent  and  are  normally  distributed  with  mean  •  a  +  bx'^  and  variance 


a  where 
o 

a  -  m(l  -  p)2/( 1  +  p2) 

b  -  2p/(l  +  p2) 

x'k  (x2k-l  +  x2kfl)/2 
aQ2  -  o2(l  -  p2)/(l  +  p2). 

Conditionally,  it  may  therefore  be  assumed  that  the  satisfy  the  simple 
linear  regression  model 


(2.1) 


C2k  *  3  +  bx’k  +  «k  k  " 


where  {e^}  are  i*i*d  N(0,oo2). 

Given  the  sample  observations  x, ,x.,...,x.  ,x„  . from  standard  regression 

1  z  zn  zn+1 

2 

theory  the  parameters  a,b  and  in  (2.1)  can  be  estimated  using  the  first 


2n-l  observations  as 


b  ■  q  /s 

xy  xx 

a  »  *2  -  bx' 

cr  2  ■  (s  -  b2s  )/ (n-3) 
o  yy  xx 


(2.2) 


where 


^  <*!  *  2-3  +  . . .+  2x2n_3  +  x^)/1 


_  n-l 

X2  m  ^  (n-^-^ 
k-1  ztt 

n-l 


s 


xx  kf1 


2 


8  m 

yy 


xy 


n~^  2  —  2 

T.  x,,  -  (n-l)x, 
k-1  * 


n-1  _ 

2  x  .  x*.  ■  (n-l)x  X- 

k«l  K  4 


If  x.  Is  Che  least  squares  predictor  of  x.  l.e.,  x„  -  a  +  bx'  then 
it  Zn  Zn  n 


*2n  “  X2n 


L  xx  J 


has  a  student's  t-distributlon  with  n-3  degrees  of  freedom;  hence 


*  *• 

1  (x*  -x')2' 

*x2n  "  X2n'  <  C  ao 

i  +  ,  + 

n-l  s 

.  XX 

) 

l-o 


where  t  Is  the  100(1  -  j)th  percentage  point  of  the  studentis t-distribution 

with  n-3  degrees  of  freedom.  The  above  probability  statement  can  be  converted 

into  a  prediction  interval  for  x^ ( ^ .  by  noting  that  x2n  is  a  function  of 

x'  ■  (x„  .  +  x_  .. )/2,  as  shown  below, 

n  zn—  t  zirn 

Squaring  the  inequality  and  rearranging  terms  the  above  probability 
statement  can  be  expressed  as 


(2.3) 


or 


A(x'  -  x')2 
n 


B(x'  -  x') 
n 


+  C  < 


•] 


1-a 


(2.4) 


3 


where 


t-2;  2 

An  t  0 

A  -  b2 - — 

s 

XX 


B 

C 


*2b(x2n  “  x2> 


2A 
nt  o 

o 


2 


n-1 


(2.5) 


A  prediction  "interval"  for  x'  «  ^x2n-l  +  x2n+l^2  and  in  turn  for 

x,>n_._,  is  now  obtainable  in  terms  of  the  roots  of  the  quadratic  expression 
2  ' 

in  (2.4).  If  B  -  4AC<0  i.e.,  the  roots  are  complex  the  prediction  interval 
will  be  taken  to  be  (-«%«);  in  the  other  cases  the  "interval"  can  turn  out  to 
be  a  two  sided  interval,  a  one  sided  interval  or  the  union  of  two  one  sided 
intervals.  The  different  possibilities  will  now  be  examined  in  detail. 


3.  PROPERTIES  OF  THE  PREDICTION  INTERVAL 


Case  1:  A  >  0 
Let 


:2 

b  s 


F  - 


xx 


2-  2 
t  o 

o 


Then ,  A  >  0  ^  F  >  1  . 
Also,  B2  -  4AC>0  & 


F  >  1  - 


(n-l)(x2n  -  x2) 

~F1 

nt  a 

o 


2 


Hence,  A>0  ^B  -  4AC>0  and  the  prediction  interval  for  x-  ,,  will  be 

ZtvtI 

of  the  form  (D-E,  D+E)  where 


4 


7 

Case  2:  A  <  0,  B  -  4AC>0 


A  <  0  and  B2  -  4AC>0<'"> 

,  n-1  <x2n  -  V2  _  . 

1  -  ~  -2Ti  <  F  <  1 

t  a 


and  the  prediction  interval  will  be  the  union  o£  two  non-overlapping  intervals 

(-«*  IHE)  and  (D-E,«).  Note  that  in  this  case  E  <  0. 

2 

Case  3:  A  j  0,  B  -  4AC<0 

As  indicated  earlier,  the  roots  will  be  complex  and  the  prediction 

interval  is  defined  to  be  (-«%*).  We  will  call  the  prediction  intervals 

resulting  from  the  above  three  cases  a  type  1,  type  2  and  a  type  3  interval, 

2 

respectively.  There  are  two  other  cases  viz.,  A  *  0  and  B  -  4AC  *  0  and 
we  have  ignored  these  possibilities  since  their  probability  of  occurrence 
is  zero.  It  should  be  clear  that  the  following  identity  holds  for  the  pre¬ 
diction  coefficient  1  -  a. 


3 

Z  P[an  interval  of  type  i  is  obtained].  P[the  interval  will  contain 
i-1 


2 

To  study  the  effect  of  the  parameters  n,p,o  and  ct  on  the  probability  of 
occurrence  of  the  different  types  of  Intervals  we  conducted  a  simulation.  For 
each  choice  of  the  parameter  values,  2n+l  samples  are  generated  from  the  auto¬ 
regressive  process  (1.1)  and  the  prediction  interval  for  x.  is  calculated 
using  the  first  2n  values.  We  then  calculated  the  empirical  frequencies  of 
the  three  types  of  intervals,  in  1000  replications^  and  also  for  each  type  of 
Interval  the  frequency  of  inclusion  of  x2n+i/  Table  1  we  present  the 
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results  for  a  -  .05,  a  -  1.0,  n  -  14,22,30,38  as  p  takes  on  the  values 
.1,  .3,  .5,  .7,  .9.  Table  2  shows  the  effect  of  increasing  n  as  the  other 
parameters  are  held  fixed.  In  Table  3  the  standard  deviation  a  is  varied 
from  1  to  5  while  the  other  parameter  values  are  fixed.  Some  of  the  results 
are  also  presented  in  graphical  form  in  figures  1-5. 

The  following  general  conclusions  can  be  drawn  from  the  results  of  the 
simulation.  The  probability  of  obtaining  a  type  1  interval  Increases  with 
p,  n  and  a.  For  n  >  15  (30  or  more  samples)  and  p  >  .5  the  probability  of 
a  type  1  interval  is  of  the  order  of  .85.  The  standard  deviation  a  does 
not  appear  to  have  any  effect  on  this  probability. 


4.  AN  EXAMPLE 


The  following  data  represents  the  monthly  Dow- Jones  industrial  averages 
for  the  years  1966-67. 


1966  1967 


Jan 

31 

983.51 

Jan 

31 

879.87 

Feb 

28 

951.89 

Feb 

28 

839.37 

Mar 

31 

924.77 

Mar 

31 

865.98 

Apr 

30 

933.68 

Apr 

28 

897.05 

May 

31 

884.07 

May 

31 

852.56 

Jun 

30 

870.10 

Jun 

30 

860.26 

Jul 

29 

847.38 

Jul 

31 

904.24 

Aug 

31 

788.41 

Aug 

31 

901.29 

Sep 

30 

774.22 

Sep 

29 

926.66 

Oct 

31 

807.07 

Oct 

31 

879.74 

Nov 

30 

791.59 

Nov 

30 

875.81 

Dec 

30 

785.69 

Dec 

29 

905.11 

Assuming  that 

the 

data  is  generated  by 

a  Gaussian  Markov 

process  of  order  one 

(a  calculation  of  lagged  correlations  supports  the  assumption  with  p  ■  .8)  we 
computed  prediction  intervals  for  March  1967,  May  1967,  July  1967,  September 
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1967  and  November  1967  based  on  all  Che  preceding  data  and  the  results  are 
presented  below. 


Month 

n 

Lower 

Prediction 

Limit 

Upper 

Prediction 

Limit 

Length 

of 

Interval 

True 

Value 

Mar  67 

7 

727.11 

1080.63 

353.52 

865.98 

May  67 

8 

598.39 

900.56 

302.16 

852.56 

Jul  67 

9 

708.70 

1009.10 

300.40 

904.24 

Sep  67 

10 

573.59  ' 

864.87 

291.28 

926.66 

Nov  67 

11 

633.12 

899.73 

266.61 

875.81 

All  the  intervals  except  for  September  1967  contain  the  true 
value.  As  is  to  be  expected  the  length  of  the  interval  decreases 
with  an  Increase  in  sample  size. 


REFERENCES 


[1]  Ogawara,  Masami  (1951),  "A  Note  on  the  Test  of  Serial  Correlation 
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PROBABILITIES  OF  OCCURRENCE  OF  PREDICTION  INTERVALS 
OF  TYPES  1,2,3  IN  1000  REPLICATIONS 


TABLE  I 


a  -  1 

a  -  .05 

n 

P 

P(type  1) 

P ( type  2) 

P(type  3) 

Empirical 

Prediction 

Coefficient 

.1 

.193 

(.917) 

.066 

(.576) 

.741 

.956 

.3 

.342 

(.915) 

.080 

(.775) 

.578 

.953 

7 

.5 

.269 

(.888) 

.098 

(.786) 

.633 

.949 

.7 

.369 

(.902) 

.119 

(.874) 

.512 

.949 

.9 

.539 

(.939) 

.128 

(.930) 

.333 

.958 

.1 

.492 

.460 

.941 

(.935) 

(.438) 

.3 

.516 

.077 

.939 

(.924) 

(.714) 

.5 

.483 

.110 

.936 

(.909) 

(.818) 

.7 

.791 

.071 

.138 

.939 

(.934) 

(.873) 

.9 

.902 

.050 

.048 

.948 

(.947) 

(.920) 

LL. _ . _ _ _ _ _ _ _ _ _ 


Table  I  (Continued) 


Empirical 


n 

P 

P(type  1) 

P(type  2) 

P ( type  3) 

Prediction 

Coefficient 

.1 

.554 

.036 

.410 

.968 

(.960) 

(.722) 

.3 

.861 

.033 

.106 

.940 

(.940) 

(.758) 

15 

.5 

.874 

.044 

.082 

.951 

(.952) 

(.841) 

.7 

.914 

.028 

.058 

.953 

(.951) 

(.929) 

.9 

.979 

.008 

.013 

.953 

(.952) 

(1.000) 

.1 

.573 

.047 

.380 

.948 

(.932) 

(.723) 

.3 

.719 

.044 

.237 

.952 

(.947) 

(.773) 

21 

.5 

.853 

.047 

.100 

.946 

(.943) 

(.894) 

.7 

.980 

.007 

.013 

.948 

(.948) 

(.857) 

.9 

.998 

.001 

.001 

.947 

(.947) 

(1.000) 

The  numbers  in  parentheses  are  the  probabilities  that 
is  contained  in  the  interval;  for  a  type  3  interval  this  prob¬ 
ability  is  always  1. 


PROBABILITIES  OF  OCCURRENCE  OF  PREDICTION  INTERVALS 
OF  TYPES  1,2,3  IN  1000  REPLICATIONS 

TABLE  II 


o 


a  *  .05 


Empirical 

p 

n 

P(type  1) 

P(type  2) 

P(type  3) 

Prediction 

Coefficient 

3 

.300 

.040 

.660 

.944 

(.867) 

(.600) 

4 

.342 

.080 

.578 

.953 

(.915) 

(.775) 

5 

.587 

.057 

.356 

.951 

(.942) 

(.737) 

.3 

6 

.516 

.077 

.407 

.939 

(.924) 

(.714) 

7 

.259 

.106 

.635 

.954' 

(.946) 

(.698) 

8 

.861 

.033 

.106 

.940 

(.940) 

(.758) 

3 

.232 

.049 

.719 

.951 

(.853) 

(.694) 

4 

.269 

.098 

.633 

.949 

(.888) 

(.786) 

5 

.540 

.083 

.377 

.950 

(.935) 

(.819) 

.5 

6 

.483 

.110 

.407 

.936 

(.909) 

(.818) 

7 

.561 

.109 

.330 

.951 

(.941) 

(.853) 

8 

.874 

.044 

.082 

.951 

(.952) 

(.841) 
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Table  II  (Continued) 


p 

n 

P(type  1) 

P(type  2) 

P(type  3) 

Empirical 

Prediction 

Coefficient 

3 

.308 

(.896) 

.072 

(.861) 

.620 

.958 

4 

.369 

(.902) 

.119 

(.874) 

.512 

.949 

.7 

5 

.586 

(.920) 

.095 

(.884) 

.319 

.942 

6 

.791 

(.934) 

.071 

(.873) 

.138 

.939 

7 

.840 

(.940) 

.065 

(.908) 

.095 

.944 

8 

.914 

(.951) 

.028 

(.929) 

.058 

.953 

PROBABILITIES  OF  OCCURRENCE  OF  PREDICTION  INTERVALS 
OF  TYPES  1,2,3  IN  1000  REPLICATIONS 


TABLE  III 


a  -  .05 


a 

P(type  1)  • 

P(type  2) 

P(type  3) 

Empirical 

Prediction 

Coefficient 

n*8 

1 

.369 

(.902) 

.119 

(.874) 

.512 

.949 

P-.7 

2 

.352 

(.901) 

.123 

(.886) 

.525 

.951 

3 

.354 

(.898) 

.122 

(.885) 

.524 

.950 

4 

.354 

(.898) 

.121 

(.884) 

.525 

.950 

5 

.353 

(.898) 

.122 

(.885) 

.525 

.950 

n-12 

1 

.483 

(.909) 

.110 

(.818) 

,407 

.936 

p*.5 

2 

.446 

(.901) 

.115 

(.826) 

.439 

.936 

3 

.445 

(.899) 

.117 

(.828) 

.438 

.935 

4 

.  446 
(.899) 

.115 

(.826) 

.439 

.935 

5 

.444 

(.901) 

.115 

(.817) 

.441 

.935 

12 


PROB.  OF  TYPE  1  INT 


APPENDIX 


C  PROGRAM  TO  CALCULATE  PROBABILITY  OF  TYPE  I, type  2,TY»E 

C  3» T Y° E  4  INTERVALS  FnR  SIMULATED  SAMPLES. 

C  PROGRAMMER  T  S  MURTHY  SEP  1979. 

*#«**#**  ******  ****  ************************  ******* 
Cl  MENS  ION  Z(55)*X(55)*V(  100, 55) , XI  (  50) » YI(  50 ) , S( 55)  , 
UC(5)tIR(  5),IPC(  5),  STAT{10,7)  ,VS(  10. 10»  7)  ,  I  VS  (1  0) 

CALL  OVFLOW 
INDEX  *1 
SIGMA*1.0 

1  READ! 5,2)  K,T 
WRI'-'E  (6,2)K,T 

2  FOP  '•ATI  IX,  12,  2X  F  5.3) 

IF ( K  .EO.  0  )  GO  TO  460 
ROW *0.10 
00  300  13*1,  5 
DO  10  J»l,5 
0*(1-RQW**2  )  **.  5 
C  SIMULATION  OF  SAMPLES 

ISEED* 12345 
10  IPC(J)*0 

CO  250  I A*1 , 10 

DO  5C  M«l,100 

CALL  SNORM(  I  SEED,  Z ,  K) 

X  ( 1 )  *R  IGM  A*  Z  ( 1 ) 

CO  30  J*2 ,K 

30  X(J  )»ROW*X<  J-  1)*3*S  IGMA*Z(  J) 

DO  40  L*l »K 
40  V(M,L)*X(L) 

50  CONTINUE 

DO  60  11*1,5 
60  IR(II)*0 

CO  200  1*1,  IOC 
DO  70  J*1,K 
70  S(J)*V(I,J) 

K*K-5 

KX*K/2 
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... 


I 


03  EC  L*1  •  KK 
80  VIU)-S(2*L) 

N*KK-1 

03  50  LL*  It  N 

90  *I<UJ»($(2*LL-1)*SI2*LI>1)  I/2.J 

XSUMO.O 
VSUM-O.o 
$xx-c.o 

SXY*0  .0 
SYY*C.O 
03  ICO  KL  *  1  • 5 
100  IC<KLJ*0 

00  110  M*1,N 
Y$UM*YSUM+YI (V) 

XSU«*XSUM+X1(MJ 
110  CONTINUE 
XB-XSUM/N 
Y9*YSUM/N 
00  120  M-l.N 
SXX*SXX+(  XI  (M)-XB)**2 
SYY  *SYY ♦( Y I ( M )— Y B)**2 
SXY*SXY+(  XI(M)-XB)*(YI{  MJ-YB) 

120  CONTINUE 

VRE$*(SYY-(  <SXY**2  J/SXXU/CN-2) 

EH*  SXY/SXX 

AH* YB-BH*  XB 

SS«(T**2  i*VRES 

A* ( EH**2  )-SS/S<X 

<>»S(2*KK  )-YB 

8*-C  2*BH*P) 

C«(o**2J-(SS*(N+l )  }/N 
F*  ( BF**2  )*SXX/SS 
S$Q*1.0-N*<  P**2)  /( SS*(N+1)) 

IF  (F  .lT.  SSC)  GO  r?  150 
IF  ( F  .EO.l.O)  G3  TO  500 
0»t  2.*X8)-S(K-l)  +  2.  *BH*P/A 


19 


E»<  2./A) *  1C  C  S$*C  P**2)  /SXXH-  (N+1)*A*SS/N)**.5  ) 

PI L*0-E 
P IR=0+E 
PVAL*SC  K+l) 

if c f  .gt.  l.o)  ;  o  to  130 

1C  (2  1CC2I+1 

IFCPVAL.LE.PIR  .OR.  PVAL.GE.PIL)  ICC  5) *IC C5 1  +  1 
GO  TQ  160 

130  ICC1  }=ICC1)+1 

IF(°IL.LE.PVAI.AND.  PIR.GE.PVAU  ICC4)=IC(4)+1 
GO  TO  160 

150  1C  <3  J  =  IC(3  j  +1 

160  00  170  J*i,5 

170  IRC  J)*IRC  JJ+ICCJ) 

K= K+  5 

200  CONT  INU  E 

CO  220  J  =  l,  5 

220  IPCC  J)»IPCC  JUIRC  J) 

250  CONTINUE 

C  PRINT  STATISTICS 

ST AT(  IR,  l)*POh 
STAT(IB,2J=IPC<1)  /1CC0.C 
STA’(IB,3»  =  IPC(2)/1000.0 
ST  AT  (  18, 4  }=  IP  C(  3  )/ 1000.0 
IF(  STAT(  16,2)  .EO.O.  C)  GO  TO  275 
STATC IB »5 )  *1  PC  C  4) / C STAT Cl B, 2 ) *1 000 .0 ) 

GO  TC  280 

275  ST  AT  C  IB,  5  )=  I°C(  4  ) 

280  IF( STATC IB, 3)  .EO.  0.0)  GO  TO  285 

STAT(I3»6)*IPCC5)  /C  STAT  { I  8, 3 )  *1 000 .0 ) 

GC  TC  290 

285  ST  AT  ( IB,  6  )a  IP 0(5  ) 

290  SIG-STATC  IB  ,2  )*STA  TUB  ,  5)  +  STAT < I  B , 3)  *  STAT (  I B ,  6  ) 
1  +STATC IB ,4) 

ST  AT  C  IB  ,7  )*$  IG 
R0W-R0W+0.2 


20 


300  CONTINUE 

00  305  I Q*1 t 5 
CO  305  JQ*1,7 

VS ( INDEX*  10  ,JQ)*STAT(  IQ,JQ) 

305  CONTINUE 

IVS( INDEX )*N 
I NOE  X»INDEX+ I 
I S  Z*K-5 

WRIT  £(6,310)  ISZ,  -SIGMA,N 
WR  ITE  (  6,325) 

WRI TE(  6, 3 50)  (  (STAT(K,L)  ,L=1,7)  ,K*1,5) 

310  FORMAT  (IX  ,'  SAMPLE  SIZE  =  *,15,'  SIGMA  =  *,F5.0,  • 

IN  =  • ,  15 ,  /  ) 

325  F3RMATUX,'  10W  TYPE  1  TYPE  2  “Y CE  3 
lFROe.l  PRCB.2  CON. PEG  ',/,70('-')) 

350  FORMAT  (  7  (  F8  .3,  2X  ) » /  ) 

WRITE<6,485) 

GO  TO  1 
460  CCRF*0.1 

INCEX*  INDEX-1 
00  470  J  =  l,  5 

WRITE  (6,472)  CORR,  SIGMA 
WR IT  E (6  ,475  ) 

DC  471  1*1, I  NOE  X 

VU  7E(  6,4  80)  (  IVS(I )  ,(VSCI,J,K)  ,K=2,7)) 

471  CONrI NUE 
C3RP*C0RR-»0  .2 

470  CONTINUE 

472  FORMATI  '  CORR.COEFF.  *  •  ,  F5  .3  ,’  SIGMA*  *,F5.0,//) 

475  FORMAT  ( *  SAMPLE  S IZE  TYPE  1  TYPE  2  TY°E  3  PRO 

ie.1  PR05.2  CON  .P  EG  »,/,7  0('-*)) 

480  FORM  AT  ( I  5 , 5X ,  6C  F  8.  3 ,2  X)  ,  /  ) 

500  STOP 
EMC 
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